irrelevant variable
Variational Garrote for Statistical Physics-based Sparse and Robust Variable Selection
Soh, Hyungjoon, Lee, Dongha, Periwal, Vipul, Jo, Junghyo
Identifying relationships between variables is a fundamental task in science. Among various approaches, linear regression plays a central role in linking explanatory variables to dependent variables in statistical modeling [1, 2]. Linear regression is useful in physics [3, 4] for extracting equations of motion from time series data [5] and for predicting trends in dynamical systems [6], but its simplicity, interpretability, and predictive power make it a cornerstone of data analysis [7], forecasting [8], and decision-making [9] in many fields. Moreover, linear regression forms the foundation for many advanced statistical and machine learning models [10], including logistic regression [11], support vector machines [12], and generalized linear models [13]. Extensions of linear regression often aim to capture more complex relationships by introducing higher-order polynomial terms or additional nonlinear transformations. Modern developments in machine learning have enabled the training of deep and highly overparameterized models capable of modeling intricate patterns far beyond the reach of simple linear approaches. In particular, deep learning models can be interpreted as sophisticated forms of nonlinear regression [14], capable of approximating complex functions with high flexibility. Despite its utility, linear regression struggles with modern high-dimensional datasets where only a small subset of variables is truly informative.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
The Post Double LASSO for Efficiency Analysis
Parmeter, Christopher, Prokhorov, Artem, Zelenyuk, Valentin
Big data and machine learning methods have become commonplace across economic milieus. One area that has not seen as much attention to these important topics yet is efficiency analysis. We show how the availability of big (wide) data can actually make detection of inefficiency more challenging. We then show how machine learning methods can be leveraged to adequately estimate the primitives of the frontier itself as well as inefficiency using the `post double LASSO' by deriving Neyman orthogonal moment conditions for this problem. Finally, an application is presented to illustrate key differences of the post-double LASSO compared to other approaches.
- Europe > Spain (0.04)
- Oceania > Australia > Queensland (0.04)
- North America > United States (0.04)
- (3 more...)
On the Effects of Irrelevant Variables in Treatment Effect Estimation with Deep Disentanglement
Khan, Ahmad Saeed, Schaffernicht, Erik, Stork, Johannes Andreas
Estimating treatment effects from observational data is paramount in healthcare, education, and economics, but current deep disentanglement-based methods to address selection bias are insufficiently handling irrelevant variables. We demonstrate in experiments that this leads to prediction errors. We disentangle pre-treatment variables with a deep embedding method and explicitly identify and represent irrelevant variables, additionally to instrumental, confounding and adjustment latent factors. To this end, we introduce a reconstruction objective and create an embedding space for irrelevant variables using an attached autoencoder. Instead of relying on serendipitous suppression of irrelevant variables as in previous deep disentanglement approaches, we explicitly force irrelevant variables into this embedding space and employ orthogonalization to prevent irrelevant information from leaking into the latent space representations of the other factors. Our experiments with synthetic and real-world benchmark datasets show that we can better identify irrelevant variables and more precisely predict treatment effects than previous methods, while prediction quality degrades less when additional irrelevant variables are introduced.
- Europe > Sweden > Örebro County > Örebro (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Determination of class-specific variables in nonparametric multiple-class classification
Chen, Wan-Ping Nicole, Chang, Yuan-chin Ivan
As technology advanced, collecting data via automatic collection devices become popular, thus we commonly face data sets with lengthy variables, especially when these data sets are collected without specific research goals beforehand. It has been pointed out in the literature that the difficulty of high-dimensional classification problems is intrinsically caused by too many noise variables useless for reducing classification error, which offer less benefits for decision-making, and increase complexity, and confusion in model-interpretation. A good variable selection strategy is therefore a must for using such kinds of data well; especially when we expect to use their results for the succeeding applications/studies, where the model-interpretation ability is essential. hus, the conventional classification measures, such as accuracy, sensitivity, precision, cannot be the only performance tasks. In this paper, we propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class such that we can have more information about its classification rule and the character of each class as well. The proposed method can have its prediction power approximately equal to that of the Bayes rule, and still retains the ability of "model-interpretation." We report the asymptotic properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations. We also separately discuss the variable identification, and training sample size determination, and summarize those procedures as algorithms such that users can easily implement them with different computing languages.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Minnesota (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Matching in Selective and Balanced Representation Space for Treatment Effects Estimation
Chu, Zhixuan, Rathbun, Stephen L., Li, Sheng
The dramatically growing availability of observational data is being witnessed in various domains of science and technology, which facilitates the study of causal inference. However, estimating treatment effects from observational data is faced with two major challenges, missing counterfactual outcomes and treatment selection bias. Matching methods are among the most widely used and fundamental approaches to estimating treatment effects, but existing matching methods have poor performance when facing data with high dimensional and complicated variables. We propose a feature selection representation matching (FSRM) method based on deep representation learning and matching, which maps the original covariate space into a selective, nonlinear, and balanced representation space, and then conducts matching in the learned representation space. FSRM adopts deep feature selection to minimize the influence of irrelevant variables for estimating treatment effects and incorporates a regularizer based on the Wasserstein distance to learn balanced representations. We evaluate the performance of our FSRM method on three datasets, and the results demonstrate superiority over the state-of-the-art methods.
- North America > United States > Georgia > Clarke County > Athens (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Ireland (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
A study of local optima for learning feature interactions using neural networks
In many fields such as bioinformatics, high energy physics, power distribution, etc., it is desirable to learn non-linear models where a small number of variables are selected and the interaction between them is explicitly modeled to predict the response. In principle, neural networks (NNs) could accomplish this task since they can model non-linear feature interactions very well. However, NNs require large amounts of training data to have a good generalization. In this paper we study the datastarved regime where a NN is trained on a relatively small amount of training data. For that purpose we study feature selection for NNs, which is known to improve generalization for linear models. As an extreme case of data with feature selection and feature interactions we study the XOR-like data with irrelevant variables. We experimentally observed that the cross-entropy loss function on XOR-like data has many non-equivalent local optima, and the number of local optima grows exponentially with the number of irrelevant variables. To deal with the local minima and for feature selection we propose a node pruning and feature selection algorithm that improves the capability of NNs to find better local minima even when there are irrelevant variables. Finally, we show that the performance of a NN on real datasets can be improved using pruning, obtaining compact networks on a small number of features, with good prediction and interpretability.
Contingency Training
Vargas, Danilo Vasconcellos, Takano, Hirotaka, Murata, Junichi
When applied to high-dimensional datasets, feature selection algorithms might still leave dozens of irrelevant variables in the dataset. Therefore, even after feature selection has been applied, classifiers must be prepared to the presence of irrelevant variables. This paper investigates a new training method called Contingency Training which increases the accuracy as well as the robustness against irrelevant attributes. Contingency training is classifier independent. By subsampling and removing information from each sample, it creates a set of constraints. These constraints aid the method to automatically find proper importance weights of the dataset's features. Experiments are conducted with the contingency training applied to neural networks over traditional datasets as well as datasets with additional irrelevant variables. For all of the tests, contingency training surpassed the unmodified training on datasets with irrelevant variables and even outperformed slightly when only a few or no irrelevant variables were present.
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > New York (0.04)
Collapsing-Fast-Large-Almost-Matching-Exactly: A Matching Method for Causal Inference
Dieng, Awa, Liu, Yameng, Roy, Sudeepa, Rudin, Cynthia, Volfovsky, Alexander
We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. Matching methods are heavily used in the social sciences due to their interpretability, but most matching methods in the past do not pass basic sanity checks in that they fail when irrelevant variables are introduced. Also, past methods tend to be either computationally slow or produce poor matches. The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible. To do this, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of optimization problems simultaneously. Notable advantages of our method over existing matching procedures are its high-quality matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Washington > Pierce County > Tacoma (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Variable selection for clustering with Gaussian mixture models: state of the art
Talibi, Abdelghafour, Achchab, Boujemâa, Lasri, Rafik
SAA T Laboratory, University of Abdelmalek Essadi, FPL, Larache Morocco Corresponding author: Abdelghafour Talibi,a.talibi@uhp.ac.ma Abstract The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the model, making essential the selection of relevant variables for this type of clustering. After recalling the basics of clustering based on a model, this article will examine the variable selection methods for model-based clustering, as well as presenting opportunities for improvement of these methods. I INTRODUCTION Clustering aims to classify objects of a population in groups, where the objects in the same group are similar to each other, and the objects in different groups are dissimilar. Unlike the supervised classification where the number of groups is known in advance, at least for a sample, in the case of clustering, it is unknown how many groups and it remains to be estimated. In fact, many fields of research used clustering methods on the data, in order to obtain groups that allow understanding and interpreting the phenomenon studied.
- Africa > Middle East > Morocco (0.24)
- North America > United States > California > Alameda County > Berkeley (0.04)
Lazy Arithmetic Circuits
Kazemi, Seyed Mehran (University of British Columbia) | Poole, David (University of British Columbia)
Compiling a Bayesian network into a secondary structure, such as a junction tree or arithmetic circuit allows for offline computations before observations arrive, and quick inference for the marginal of all variables. However, query-based algorithms, such as variable elimination and recursive conditioning, that compute the posterior marginal of few variables given some observations, allow pruning of irrelevant variables, which can reduce the size of the problem. Madsen and Jensen show how lazy evaluation of junction trees can allow both compilation and pruning. In this paper, we adapt the lazy evaluation to arithmetic circuits, allowing the best of both worlds: pruning due to observations and query variables as well as compilation while exploiting local structure and determinism.
- North America > United States > California > San Mateo County > San Mateo (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)